InfoXtract Location Normalization: A Hybrid Approach To Geographic References In Information Extraction
نویسندگان
چکیده
Ambiguity is very high for location names. For example, there are 23 cities named ‘Buffalo’ in the U.S. Based on our previous work, this paper presents a refined hybrid approach to geographic references using our information extraction engine InfoXtract. The InfoXtract location normalization module consists of local pattern matching and discourse co-occurrence analysis as well as default senses. Multiple knowledge sources are used in a number of ways: (i) pattern matching driven by local context, (ii) maximum spanning tree search for discourse analysis, and (iii) applying default sense heuristics and extracting default senses from the web. The results are benchmarked with 96% accuracy on our test collections that consist of both news articles and tourist guides. The performance contribution for each component of the module is also benchmarked and discussed.
منابع مشابه
UB at TREC 12: HARD and Genomics Tracks
University at Buffalo (UB) participated in TREC-12 in Genomics and High Accuracy Retrieval from Documents (HARD) tracks. We explored some techniques that combine Information Retrieval and Information Extraction to perform the TREC tasks. We used an Information Extraction engine InfoXtract [3] from Cymfony Inc. to enhance retrieval results. For the Genomics primary task, documents retrieved usin...
متن کاملA Hybrid Approach Based on Higher Order Spectra for Clinical Recognition of Seizure and Epilepsy Using Brain Activity
Introduction: This paper proposes a reliable and efficient technique to recognize different epilepsy states, including healthy, interictal, and ictal states, using Electroencephalogram (EEG) signals. Methods: The proposed approach consists of pre-processing, feature extraction by higher order spectra, feature normalization, feature selection by genetic algorithm and ranking method, and classif...
متن کاملLocation Normalization for Information Extraction
Ambiguity is very high for location names. For example, there are 23 cities named ‘Buffalo’ in the U.S. Country names such as ‘Canada’, ‘Brazil’ and ‘China’ are also city names in the USA. Almost every city has a Main Street or Broadway. Such ambiguity needs to be handled before we can refer to location names for visualization of related extracted events. This paper presents a hybrid approach f...
متن کاملA geographic information system for gas power plant location using analytical hierarchy process and fuzzy logic
This study recommends a GIS-based (Geographic Information Systems) and multi-criteria evaluation for site selection of gas power plant in Natanz City of Iran. The multi-criteria decision framework integrates legal requirements and physical constraints related to environmental and economic concerns. It also builds a hierarchy model for gas power plant suitability. The methodologies used for site...
متن کامل